NCSU-SAS-Ning: Candidate Generation and Feature Engineering for Supervised Lexical Normalization
نویسنده
چکیده
User generated content often contains non-standard words that hinder effective automatic text processing. In this paper, we present a system we developed to perform lexical normalization for English Twitter text. It first generates candidates based on past knowledge and a novel string similarity measurement and then selects a candidate using features learned from training data. The system has a constrained mode and an unconstrained mode. The constrained mode participated in the W-NUT noisy English text normalization competition (Baldwin et al., 2015) and achieved the best F1 score.
منابع مشابه
Predicting word choice in affective text
Choosing the best word or phrase for a given context from among candidate nearsynonyms, such as slim and skinny, is a difficult language generation problem. In this paper we describe approaches to solving an instance of this problem, the lexical gap problem, with a particular focus on affect and subjectivity; to do this we draw upon techniques from the sentiment and subjectivity analysis fields...
متن کاملNCSU: Modeling Temporal Relations with Markov Logic and Lexical Ontology
As a participant in TempEval-2, we address the temporal relations task consisting of four related subtasks. We take a supervised machine-learning technique using Markov Logic in combination with rich lexical relations beyond basic and syntactic features. One of our two submitted systems achieved the highest score for the Task F (66% precision), untied, and the second highest score (63% precisio...
متن کاملRe-examining Automatic Keyphrase Extraction Approaches in Scientific Articles
We tackle two major issues in automatic keyphrase extraction using scientific articles: candidate selection and feature engineering. To develop an efficient candidate selection method, we analyze the nature and variation of keyphrases and then select candidates using regular expressions. Secondly, we re-examine the existing features broadly used for the supervised approach, exploring different ...
متن کاملرویکردی با ناظر در استخراج واژگان کلیدی اسناد فارسی با استفاده از زنجیرههای لغوی
Keywords are the main focal points of interest within a text, which intends to represent the principal concepts outlined in the document. Determining the keywords using traditional methods is a time consuming process and requires specialized knowledge of the subject. For the purposes of indexing the vast expanse of electronic documents, it is important to automate the keyword extraction task. S...
متن کاملOverlap-based feature weighting: The feature extraction of Hyperspectral remote sensing imagery
Hyperspectral sensors provide a large number of spectral bands. This massive and complex data structure of hyperspectral images presents a challenge to traditional data processing techniques. Therefore, reducing the dimensionality of hyperspectral images without losing important information is a very important issue for the remote sensing community. We propose to use overlap-based feature weigh...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015